Measurement Context Extraction from Text: Discovering Opportunities and Gaps in Earth Science
نویسندگان
چکیده
We propose Marve, a system for extracting measurement values, units, and related words from natural language text. Marve uses conditional random elds (CRF) to identify measurement values and units, followed by a rule-based system to nd related entities, descriptors and modiers within a sentence. Sentence tokens are represented by an undirected graphical model, and rules are based on part-of-speech and word dependency paerns connecting values and units to contextual words. Marve is unique in its focus on measurement context and early experimentation demonstrates Marve’s ability to generate high-precision extractions with strong recall. We also discuss Marve’s role in rening measurement requirements for NASA’s proposed HyspIRI mission, a hyperspectral infrared imaging satellite that will study the world’s ecosystems. In general, our work with HyspIRI demonstrates the value of semantic measurement extractions in characterizing quantitative discussion contained in large corpuses of natural language text. ese extractions accelerate broad, cross-cuing research and expose scientists new algorithmic approaches and experimental nuances. ey also facilitate identication of scientic opportunities enabled by HyspIRI leading to more ecient scientic investment and research.
منابع مشابه
A deconstructive critique of a mystical anecdote from the book Ronaq al-Majalis [The Prosperity of Meetings]
Deconstruction was first introduced in the thought of Jacques Derrida as a way of re-reading texts and questioning its presuppositions. This type of critique seeks to find new meanings by finding binary oppositions in the text and disrupting the superiority and domination of one side over the other, and on the other hand, by discovering gaps and discontinuities that have arisen in the text...
متن کاملDiscovering the Underlying Components Affecting the Usability of IoT in Iranian Libraries: A Theory Based on Context
Objective: The aim is to discover the underlying context components of IOT usability in Iranian libraries: A qualitative approach consistent with grounded theory. Method: This qualitative study was conducted based on grounded theory. Data were collected through semi-structured interviews with 13 faculty members of knowledge and information science based on purposeful and chain methods. Responsi...
متن کاملLearning Context for Text Categorization
This paper describes our work which is based on discovering context for text document categorization. The document categorization approach is derived from a combination of a learning paradigm known as relation extraction and an technique known as context discovery. We demonstrate the effectiveness of our categorization approach using reuters 21578 dataset and synthetic real world data from spor...
متن کاملReconstruction of Data Gaps in Total-Ozone Records with a New Wavelet Technique
This study introduces a new technique to fill and reconstruct daily observational of Total Ozone records containing void data for some days based on the wavelet theory as a linear time-frequency transformation, which has been considered in various fields of science, especially in the earth and space physics and observational data processing related to the Earth and space sciences. The initial c...
متن کاملStructural Linguistics and Unsupervised Information Extraction
A precondition for extracting information from large text corpora is discovering the information structures underlying the text. Progress in this direction is being made in the form of unsupervised information extraction (IE). We describe recent work in unsupervised relation extraction and compare its goals to those of grammar discovery for science sublanguages. We consider what this work on gr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.04312 شماره
صفحات -
تاریخ انتشار 2017